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Abstract 

Linear systems with large differences between coefficients ("discontinuous coefficients") 
^ arise in many cases in which partial differential equations (PDEs) model physical phenom- 

ena involving heterogeneous media. The standard approach to solving such problems is 
O to use domain decomposition (DD) techniques, with domain boundaries conforming to the 

^ boundaries between the different media. This approach can be difficult to implement when 

^ the geometry of the domain boundaries is compUcated or the grid is unstructured. This 

^ work examines the simple preconditioning technique of scaling the equations by dividing 

^ each equation by the L^-norm of its coefficients. This preconditioning is called geomet- 

ric scahng (GS). It has long been known that diagonal scahng can be useful in improving 
convergence, but there is no study on the general usefulness of this approach for discontin- 
uous coefficients. GS was tested on several nonsymmetric linear systems with discontin- 
^ uous coefficients derived from convection-diffusion elliptic PDEs with small to moderate 

^ convection terms. It is shown that GS improved the convergence properties of restarted 

,0, GMRES and Bi-CGSTAB, with and without the ILUT preconditioner. GS was also shown 

to improve the distribution of the eigenvalues by reducing their concentration around the 
^ origin very significantly. 

^ Keywords. Bi-CGSTAB; diagonal scaling; discontinuous coefficients; domain decomposition; 

CA geometric scaling; GMRES; GS; Lp-norm; linear equations; nonsymmetric equations; parallel 

^ processing; partial differential equations. 
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> 1 Introduction 

Many physical phenomena are modeled by partial differential equations (PDEs) which describe 
the relations between one or more scalar or vector fields and the surrounding media. When 
boundary conditions are prescribed, a common approach to achieving a numerical solution is 
to impose a grid and discretize the equations, thus getting a system of linear equations. In 
some cases, this approach yields a system of equations with very large differences between 
coefficients of the equations. Examples of such systems arise in modeling flow through het- 
erogeneous media with widely-varying permeability, oil reservoir simulation, electromagnetics 
and semiconductor modeling. Such systems are commonly referred to as systems with "discon- 
tinuous coefficients". 
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One of the most common methods for tackling such problems is the domain-decomposition 
(DD) approach, in which the domain is partitioned into subdomains, with the subdomain bound- 
aries conforming to the boundaries between the different media. DD techniques typically op- 
erate as follows: some boundary conditions are assumed to exist on the interfaces between 
subdomains, and a solution of the equations in each subdomain is obtained, often with only low 
accuracy. The boundary conditions at the interfaces are then updated according to these solu- 
tions, and the process is repeated until convergence is achieved. There exists a vast literature 
on this subject, and a review of the area is beyond the scope of this work; see, for example, 
m [181 [El [Ml . DD techniques may be difficult to implement on unstructured grids or when the 
boundaries between domains have a complicated geometry. 

We consider nonsymmetric linear systems with discontinuous coefficients derived from convection- 
diffusion elliptic PDEs with small to moderate convection terms. It is shown that a simple pre- 
conditioning technique, which we call geometric scaling (GS), can be applied to the system of 
equations in order to improve the convergence properties of algorithms applied to the system. 
GS(p), with an integer parameter p > I, consists of dividing each equation by the L^-norm 
of its vector of coefficients. GS, which is a particular form of diagonal scaling on the left, is 
geometric in the following sense: after applying GS(;?) to a linear system, all the rows of the 
system matrix have a unity L^-norm. 

In order to examine the general usefulness of GS in conjunction with various solution methods, 
we tested it with the two leading Krylov subspace solvers for nonsymmetric systems: GMRES 
lITTl and Bi-CGSTAB [21 J. Both algorithms were tested with and without the ILUT precon- 
ditioner [ [TSl on several nonsymmetric problems taken from (or based on) Saad [16J, van der 
Vorst [21 J, and Graham and Hagger [[H. Four basic problems were considered, as well as sev- 
eral variations of the problems, such as modifications of the differences between the coefficients 
and various grid sizes. Results are provided for three different levels of accuracy. 

Our experiments used the L2-norm, but the Li-norm yielded similar results. Our results can be 
characterized as follows: 

• In most cases, when the tested method (algorithm/preconditioner combination) converges 
to the specified accuracy criterion, GS speeds up the convergence time. 

• In many cases, when the tested method stagnates or diverges on the original system, it 
converges on the scaled system. 

• GS was not particularly useful on problems without discontinuous coefficients. 

• GS was not helpful on problems derived from PDEs with large convection terms. Such 
PDEs produce linear systems with very large off-diagonal elements, and these require 
other solution methods. 

We also provide data on the effect of GS on the distribution of the eigenvalues. It is generally 
considered to be detrimental to the convergence of Krylov subspace methods to have a large 
accumulation of eigenvalues near the origin. In the cases examined here, there is a large accu- 
mulation of eigenvalues near the origin, and GS appears to "push" many eigenvalues away from 
the origin. Also, the eigenvalues of the scaled system appear to be distributed similarly to those 
obtained without scaling but with equal-sized coefficients. 

Note that the above choice of algorithm/preconditioner combinations does not imply that these 
methods are optimal for the above problems. Finding an optimal method is problem-dependent 
and a topic for further research. 
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Diagonal scaling is not new, and its usefulness for discontinuous coefficients has also been 
noted. Van der Sluis [20] deals with the effect of scaling on the condition number of matrices. 
Widlund [22, p. 34-35] writes that well-scaled ADI methods give good rates of convergence 
when the coefficients of elliptic problems vary very much in magnitude (ADI stands for the 
alternating direction implicit method [12]). Graham and Hagger [|8l p. 2042-2043] write that 
diagonal scaling has been observed in practical computations to be very effective as a precon- 
ditioner for problems with discontinuous coefficients. Duff and van der Vorst [2] write that 
on vector machines, diagonal scaling is often competitive with other approaches. Regarding 
diagonal scaling of symmetric matrices, see Meurant [[TTl Th. 8.1, 8.2], who also has some 
reservations about the usefulness of diagonal scaling for parallel processing [fTTl p. 401]. Gam- 
bolati et al. [3] use the least square logarithm (LSL) scaling on the rows and the columns of the 
system matrix for a certain problem in geomechanics with discontinuous coefficients. 

However, most previous works are concerned with symmetric systems and diagonal scaling 
refers to multiplying the system matrix on the left and on the right by the same diagonal matrix, 
in order to preserve symmetry. Also, there is no study on the general usefulness of geometric 
scaling for nonsymmetric problems with discontinuous coefficients. The idea of geometric 
scaling was found to be useful for certain problems in image reconstruction from projections; 
see [|7J. 

DD methods are often mentioned in connection with parallel processing. The equations of dif- 
ferent domains can, in principle, be solved in parallel by different processors. However, the 
different domains that arise in many practical situations do not necessarily lead to an optimal 
load-balancing assignment of equations to processors. With the GS approach, inherently par- 
allel algorithms such as Bi-CGSTAB and GMRES can be implemented efficiently in parallel 
without regard to subdomain boundaries. Needless to say, GS is inherently parallel. 

The rest of this paper is organized as follows. [|2] presents some essential background. [Q- 
^deal with the four different problems, and [|7] concludes with a summary and some future 
research directions. 



2 General background 

Throughout the rest of the paper, we assume that all vectors are column vectors, and we use the 
following notation: (p, q) denotes the dot product of two vectors p and q, which is also p^q. 

Given a vector jc = (xi, . . . ,XnY ^ ^\ we denote its Lp-norm by \\x\\p = {x\-\ [-XnY/P. For 

p = 2, we will omit the index and just write ||jc|| = ||jc||2 = a/ {x,x). If A is an nxn matrix, we 
denote by a/ the zth row-vector of A; i.e., a,- = {an , . . . , a,,,)^. 

Consider a system of n hnear equations in n variables: 

n 

'^aijXj = bi for l<i<n, or, in matrix form: Ax = b. (2.1) 



We shall assume throughout that (2.1 1 is consistent and that A does not contain a row of zeros. 
For p > 1, we define a diagonal matrix D = diag(l/||ai ||p, . . . , l/||fl„||p). The geometrically- 
scaled system is defined as 

DAx = Db. (2.2) 
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In some algorithms, GS{p) is an inherent step in the following sense: either the scaling is exe- 
cuted at the beginning as an intrinsic part of the algorithm, or, executing the algorithm produces 
identical results to those obtained when GS(p) is done at the beginning. As an example, it 
is easy to see that GS(2) is inherent in the Kaczmarz algorithm (KACZ) [9J. KACZ can be 
described geometrically as follows: starting from an arbitrary point x° e R" as the initial iter- 
ate, KACZ projects the current iterate orthogonally towards a hyperplane defined by one of the 
equations. The hyperplanes are chosen in cyclic order. 

The tests were run using the AZTEC software library [fT9l , which includes a wide range of 
algorithms and preconditioners, suitable for sequential and parallel implementations. Geometric 
scaling with the Li-norm is a built-in option in AZTEC (where it is called "row-sum scaling"). 
As mentioned we used Bi-CGSTAB and GMRES, with and without ILUT. Restarted GMRES 
was used with Krylov subspace size of 10. In AZTEC, GMRES is implemented with a double 
classical Gram-Schmidt orthogonalization step. ILUT was implemented with AZTEC's default 
parameters of drop tolerance = and fill-in = 1 . These parameters are not necessarily optimal 
for the tested examples, but since our purpose was to demonstrate the general usefulness of 
GS, we stuck with a commonly-used restart value for GMRES and AZTEC's default values 
for ILUT. No doubt, better convergence properties and runtimes would be obtained if these 
parameters were fine-tuned to each problem. The eigenvalue computations were done with the 
LAR^CK linear algebra package LI Oil . 

We also experimented with the option of multiplying the system matrix A by D2 on the left and 
on the right, resulting in the system D^AD^y = D2b, with x = D2y. In one problem, the results 
were somewhat poorer than those obtained with GS, but otherwise, the results were similar. 
Note that this option is computationally more expensive. Gambolati et al. [3| remark that when 
using ILU(O), the iteration matrix of Bi-CGSTAB on a diagonally scaled system (on the left 
and/or the right) is theoretically the same as the one with ILU(O) on the original matrix. In 
our experiments, the effect of ILUT was identical to that of ILU(O), so in theory, Bi-CGSTAB 
with ILUT on the original and the scaled systems should have behaved similarly. However, our 
experiments indicated that using GS produces much better numerical results, and this also holds 
for GMRES with ILUT. This is probably due to the fact that the scaled system does not have 
very large differences between coefficients, so it is much less prone to roundoff errors. 



2.1 Setup of the numerical experiments 

In two dimensions, the general form of the second-order differential equations in this study was 

^{a{x,y)u^) + -^{b{x,y)uy) + --- = F, 

where a and b are given functions of x and y, "■ ■ ■ " stands for lower-order derivatives, and F 
is a prescribed RHS. In three dimensions, there are three given functions a,b,c of x,y,z, and 
a second order partial derivative w.r.t. z was also included. Boundary conditions were either 
Dirichlet or mixed Dirichlet and Neumann. The regions were taken as the unit square or the 
unit cube. The discretization of the second-order derivatives at a given grid point (/, j) was done 
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using central differences; e.g., -^(aux) was approximated as 




All problems were discretized with equally-spaced grids, and the initial value was taken as 
= 0. The tests were run on a Pentium IV 2.8GHz processor with 3GB memory, running 
Linux. 



2.2 Stopping tests 

There are several stopping criteria which one may apply to iterative systems. Our stopping 
criterion was to use the relative residual: — Ajc||/||Z? — Ax'^H < e, where £ was taken as 10^^, 
10^^ and lO^^'^. In some of the cases, this was not attainable. Since this stopping criterion 
depends on the scaling of the equations, we always made this test on the geometrically-scaled 
system using the L2-norm. In order to limit the time taken by the methods implemented in 
AZTEC, the maximum number of iterations was set to 10,000. The AZTEC library has several 
other built-in stopping criteria: numerical breakdown, numerical loss of precision and numerical 
ill-conditioning. In the results, we denote the relative residual by rel-res, and non-convergence 
by "no conv." 

One should note that the test for numerical breakdown in AZTEC uses the machine precision 
DBL_EPSILON, and this may result in a premature notice of numerical breakdown in some cases. 
To get around this problem, we multiplied the variable brkdown_tol in the Bi-CGSTAB algo- 
rithm by some small factor, such as 10^^^. (brkdown_tol is normally set equal to DBL_EPSILON, 
which is approximately 2.22 x 10^^^ on our machine). 



3 Problem 1 



Problem 1 is taken from Saad fT6'i §3.7, problem F2DB]. It is a two-dimensional PDE 

d , , /, N didu) d(eu) 



where 

, . . ,103 if l<x,y<l 

a{x,y) = b[x,y) 



1 Otherwise 



and d{x,y) = I0{x + y) and e{x,y) = I0{x — y). The Dirichlet boundary condition w = is used 
on the boundary, and the RHS h is immaterial since the vector b of the linear system is chosen 
as b =Ae, where A is the system matrix and e = (1, . . . , 1)^. 



Table 3.1 shows the number of iterations and runtimes of the various algorithm/preconditioner 



methods, with and without GS, on a grid of 128 x 128. In cases of stagnation, the table shows the 
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relative residual that was achieved before stagnation. Note that GS enables the convergence of 
Bi-CGSTAB without ILUT; this is useful in a parallel setting, because Bi-CGSTAB is inherently 
parallel but ILUT is not an ideal parallel preconditioner. We can also see that GS is slightly 
helpful to Bi-CGSTAB with ILUT, and, for low-accuracy, also to GMRES with and without 
ILUT. Another observation is that when GMRES (with and without ILUT) stagnates before 
reaching the prescribed convergence goal, GS postpones the stage at which stagnation sets in, 
and enables convergence to a level that is acceptable for most practical applications. 





No. of iterations and time (in sec.) 


Method 


rel-res =10 ^ 


rel-res = 10 ^ 


rel-res =10^10 


Bi-CGSTAB 
with GS 


no conv. 
91 (0.30) 


no conv. 
299 (0.99) 


no conv. 
361 (1.19) 


Bi-CGSTAB-i-ILUT 
with GS 


31 (0.23) 
30 (0.23) 


107 (0.67) 
90 (0.59) 


142 (0.88) 
130 (0.81) 


GMRES 
with GS 


converged to 3.8 x 10 ^ 
265 (0.85) converged to 1.1x10"^ 


GMRES-i-ILUT 
with GS 


converged to 3.9 x 10^-^ 
39 (0.23) converged to 1.1x10^5 



Table 3.1: No. of iterations and runtimes for Problem 1. Grid size = 128x128. 



Three additional experiments, based on Problem 1, were also done: 

1. The values of a{x,y) and b{x,y) were increased to 10"^ in the inner square. The results 



were very similar to those shown in Table 3.1 but with slightly increased runtimes in the 
higher-accuracy cases. 

A "continuous" case: the values of a{x,y) and b{x,y) were taken as 1 throughout the unit 
square. Here, all the methods converged, and GS made very little difference. 
A second continuous case, with a{x,y) = b{x,y) = 1000. Table 3.2 shows the results of 
this experiment. As can be seen, GS made very little difference. 





No. of iterations and time (in sec.) 


Method 


rel-res =10 


rel-res = 10 ^ 


rel-res = 10^1^ 


Bi-CGSTAB 
with GS 


121 (0.41) 
123 (0.42) 


224 (0.76) 
217 (0.74) 


286 (0.96) 
261 (0.88) 


Bi-CGSTAB-i-ILUT 
with GS 


39 (0.28) 

40 (0.29) 


61 (0.41) 
60 (0.40) 


85 (0.56) 
85 (0.56) 


GMRES 
with GS 


1365 (4.31) 
1356 (4.28) 


3704 (11.69) 
3582 (11.45) 


6045 (19.19) 
5722 (18.16) 


GMRES-i-ILUT 
with GS 


140 (0.69) 
140 (0.69) 


359 (1.69) 
359 (1.69) 


579 (2.67) 
579 (2.67) 



Table 3.2: No. of iterations and runtimes for a continuous version of Problem 1, with a 
1000 everywhere. 



We turn now to studying the effect of GS on the distribution of the eigenvalues. For Problem 



1, this was done with a grid size of 40x40 for the purpose of a clear presentation. Table 3.3 
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shows the values of the minimum and maximum eigenvalues and the condition number, for the 
original, the scaled and the continuous (a = b = 1000) cases of Problem 1. Also shown for 
each case is the number of eigenvalues in the first interval (out of 100) in a histogram of the 
eigenvalue distribution. We can see that while the condition number is not changed much by 
the scaling, the number of eigenvalues in the first interval is reduced very significantly. 



Matrix 








No. of eigenvalues 
in first interval 


Original 


1.87E-2 


7.96E+3 


4.25E+5 


1126 


With GS 


7.07E-6 


1.78E+0 


2.52E+5 


4 


Cont. coef. 


1.17E+1 


8.00E+3 


6.81E+2 


8 



Table 3.3: Basic eigenvalue information for Problem 1. 



Figure 3.1 shows the distribution of the eigenvalues of Problem 1 at the full scale (left) and 
with a zoom (right). We can see that there is a congestion of eigenvalues close to zero. Figure 
3. 2| shows the distribution of the eigenvalues of Problem 1 after geometric scaling with the Li- 



norm (left) and the L2 (right). Both scalings produce fairly similar distributions; however we 
will present results throughout the paper for the L2 scaling. As to the continuous version of the 
problem, its eigenvalues were all real and distributed evenly in the range (0, 8000). 



Problem 1 : Eigenvalue distribution of 
original matrix 
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Problem 1 : Eigenvalue distribution of 
original matrix (zoom) 
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Figure 3.1: Eigenvalue distribution of the original matrix of Problem 1, with a zoom on the 
range 0-20. 

Figure [33] shows a histogram of the eigenvalue distributions for the original Problem 1, for the 
geometrically scaled problem, and for the continuous case with a = b= 1000. 



4 Problem 2 



Problem 2 is based on Example 2 from van der Vorst [1211 . to which we added convection terms. 
This example demonstrates that GS also works with mixed Dirichlet and Neumann boundary 
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Problem 1 : Eigenvalue distribution after 
geometric scaling (Li) 
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Problem 1 : Eigenvalue distribution after 
geometric scaling (L2) 
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Figure 3.2: Eigenvalue distribution for Problem 1 after geometric scaling with the Li- and the 
L2-norms. 



Problem 1 



original matrix 
geometric scaling (Lg) 
contin. coef. (a=b=1000) 




40 50 60 

Interval No. 



Figure 3.3: Histogram of eigenvalues for Problem 1, for the original matrix, the geometrically 
scaled matrix, and for a variation of Problem 1 with continuous coefficients {a = b = 1000). 



conditions. The governing PDF is 



d{D{x,y)ujc) d{D{x,y)uy 



dx 



dy 



+ aux + bUy = 1, 



with D{x,y) and boundary conditions as shown in Figure 4.4 The convection terms were a = 
b = 200. In the original example, the internal value of D is 10^, but we also tested internal 
values of 10^ and 10^. Also, the internal square in our case is somewhat smaller. The unit 
square was discretized with a grid size of 150 x 150. Together with the boundary equations, the 
system consisted of 22,952 equations. The resulting system is indefinite, with eigenvalues in 
the four quadrants of the imaginary plane. 

Tables 4.4, 43] and 4.6 show the number of iterations and runtimes of the various algorithm 
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(0,1) 3n (1,1) 



0.8 
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3n 



0.2 



D(x,y)=l 




D(x,y)=10^/lo'VlO? 









du^O 
3n 



(0,0) 02 u=0 0^8 (I'O) 
Figure 4.4: Description of Problem 2. 

and preconditioner combinations, with and without GS, for Z) = 10^, D — 10^, and Z) = 10^, 
respectively. When there was no convergence to the prescribed goal, the table shows the relative 
residual that was achieved. 





No. of iterations and time (in sec.) 


Method 


rel-res =10 ^ 


rel-res =10 ^ 


rel-res = 10^1° 


Bi-CGSTAB 
with OS 


no conv. 
591 (2.13) 


no conv. 
1025 (3.69) 


no conv. 
conv. to 7.23x10^1'^ 


Bi-CGSTAB+ILUT 
with OS 


92 (0.75) 
88 (0.72) 


139 (1.09) 
125 (1.00) 


215 (1.68) 
324 (2.45) 


GMRES 
with OS 


converged to 6.15 
converged to 2. 145 x 10""^ 


GMRES+ILUT 
with OS 


3361 (19.86) 


converged to 0.927 

converged to 3.67 x 10"^ 



Table 4.4: No. of iterations and runtimes for Problem 2 (internal D = 10^). 





No. of iterations and time (in sec.) 


Method 


rel-res = 10 


rel-res = 10 ^ 


rel-res = 10-1° 


Bi-CGSTAB 
with GS 


no conv. 
589 (2.01) 


no conv. 
707 (2.41) 


no conv. 
1493 (5.37) 


Bi-CGSTAB+ILUT 
with GS 


96 (0.76) 
79 (0.64) 


192 (1.44) 
121 (0.92) 


255 (1.86) 
163 (1.19) 


GMRES 
with GS 


converged to 1.84 
converged to 2. 19 x 10"^ 


GMRES+ILUT 
with GS 


converged to 0.924 
converged to 1 .90 x lO""^ 



Table 4.5: No. of iterations and runtimes for Problem 2 (internal D — 10^). 
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No. of iterations and time (in sec.) 


iVlclIlOU 


rel-res =10 


rel-res =10 ^ 


rel-res =10^10 


Bi-CGSTAB 

Willi 


no conv. 
224 (0.82) 


no conv. 
730 (2.65) 


no conv. 
conv. to 1.69x 10"^ 


Bi-CGSTAB+ILUT 

with 

Willi VJO 


96 (1.15) 
21 (0.23) 


192 (1.56) 
125 (0.98) 


conv. to 5.16x10^1° 
217 (1.68) 


GMRES 
with OS 


357 (1.38) 


converged to 1 .37 

converged to 2. 19 x 10^^ 


GMRES+ILUT 
with OS 


40 (0.32) 


converged to 0.90 

converged to 1.89x 10^^ 



Table 4.6: No. of iterations and runtimes for Problem 2 (internal D = 10^). 



The tables show that GS was helpful in all cases, with the exception of Bi-CGSTAB with ILUT 



in Table 4.4 , where it increased the number of iterations required in the highest accuracy case. A 
close examination of this case revealed that starting from iteration 220, BiCGSTAB with ILUT 
(after GS) was extremely oscillatory, with fluctuations of up to three orders of magnitude in the 



relative residual; the required convergence goal of 10 was almost attained at 225 iterations. 

The three tables show that the behavior of GS is not necessarily "smooth": consider the results 
of Bi-CGSTAB with ILUT for rel-res = 10^^*^: we can see that the goal was not reached for D = 
10^ andD= lO^but it was reached for D = 10^. The explanation for this is the large number of 



iterations required by Bi-CGSTAB in this case, as seen in Table 4.5 and the oscillatory nature of 
Bi-CGSTAB. The accumulated roundoff error is sometimes too large to reach the convergence 
goal, and while the updated error measured by the algorithm showed convergence, the true 
relative residual did not always decrease sufficiently. 

Another interesting point is the behavior of GS with GMRES and ILUT in the low- accuracy 
case: for D = 10^, there was convergence after many iterations, for D = 10^ there was no 
convergence, and then for D = 10^ there was a very fast convergence. In order to examine this 
behavior, we also tested this particular case with D = 10^, and we noticed that starting D = 10^, 
GS improved the convergence of GMRES by one order of magnitude for every increase in D 
by one order of magnitude. Not doubt, this phenomenon should be studied further. In any case, 
for the larger values of D, GS enabled the convergence of GMRES, with and without ILUT, to 
practical levels of accuracy. 



For the eigenvalue data, we discretized the domain by a grid of 40x40x40. Table 4.7 provides 
the basic eigenvalue information for Problem 2, for the original (D = 10^) and the scaled matri- 
ces. Also shown are the eigenvalues for a variation of Problem 2 with "continuous" coefficients, 
obtained by taking D = 10^ throughout the unit square. The last column of the table shows the 
number of eigenvalues whose real part lies in the same interval as the origin, when the range 
of (the real parts) of the eigenvalues is divided into 100 intervals. We can see that GS reduces 
the condition number by three orders of magnitude, and the reduction is even greater w.r.t. the 
continuous case. Furthermore, the number of eigenvalues around the origin is reduced very 
significantly, even below that of the continuous case. 



Figure 4.5 shows the distribution of the eigenvalues for Problem 2 with D = 10^, and for the 
geometrically scaled problem. It can be seen that the eigenvalues of the original matrix are very 
concentrated around the origin. In the scaled matrix, there are very few eigenvalues around the 
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Matrix 






^max/ -^min 


Eigenvalues 
around jc=0 


Original (D^IO^) 


3.33E-1 


1.34E+7 


4.02E+7 


1058 


With GS 


9.15E-5 


1.78E0 


1.95E+4 


8 


Continuous coef. 


1.48E-2 


1.34E+7 


9.09E+8 


130 



Table 4.7: Basic eigenvalue information for Problem 2. 



origin, but many eigenvalues have their real part around 0.6. 




Figure 4.5: Eigenvalue distribution for Problem 2, for the original and the scaled cases. 

Figure |43]provides another view of the eigenvalue distribution with a histogram of the real part 
of the eigenvalues for the original and the scaled matrices. 



5 Problem 3 



Problem 3 is also taken from van der Vorst [|2T1 Example 4]. It describes a certain groundwater 
flow problem which leads to a nonsymmetric system, with a complex geometry and several 
jumps in the discontinuities of the equations. This problem is well-known for its difficulty. The 
basic equation is the following: 



where A{x,y) and F are taken as shown in Figure 5.7[ and B{x,y) = 2exp{2{x^ + y'^)). The 



Dirichlet boundary conditions are taken as shown in Figure [5J| The unit square was discretized 
with a grid size of 128 x 128, resulting in 16,129 equations. 



Nothing converged without ILUT, so we present in Table 5.8 only the results with ILUT, with 



and without GS. On this difficult problem, GS reduced the runtime of Bi-CGSTAB with ILUT 
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Problem 2 

60 r 




Interval No. 

Figure 4.6: Histogram of the real part of the eigenvalues for Problem 2, for the original and the 
scaled cases. 

(0,1)^ u=0 ^(1,1) 

F = 

F = 

F = 

\M """^ 

A(x,y)=10'* 

A(x,y)=10"^ 
A(x,y)=10^ 

(0,0) u=l (1,0) 

Figure 5.7: Description of Problem 3. 

by about 25%. GS also enabled the convergence of GMRES with ILUT, though the achieved 
runtimes were an order of magnitude larger than those of the scaled Bi-CGSTAB with ILUT in 
the higher- accuracy cases. 





No. of iterations and time (in sec.) 


Method 


rel-res =10 ^ 


rel-res = 10 ^ 


rel-res =10^1° 


Bi-CGSTAB-i-ILUT 
with GS 


93 (0.60) 
67 (0.44) 


124 (0.79) 
90 (0.59) 


152 (0.95) 
112 (0.72) 


GMRES-i-ILUT 
with GS 


no conv. 
338 (1.58) 


no conv. 
1008 (4.61) 


no conv. 
1683 (7.71) 



Table 5.8: No. of iterations and runtimes for Problem 3. 
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For the eigenvalue data, we discretized Problem 3 with a grid of 40 x 40. The data for the 



original and the scaled matrices is summarized in Table 5.9 which also shows the number of 
eigenvalues in the first interval (out of 100) of the eigenvalue histogram. We can see that the 
condition number is decreased very significantly, and so is the number of eigenvalues in the first 
interval. 



Matrix 




^max 


■^max/ ^min 


No. of eigenvalues 
in first interval 


Original 


6.97E-5 


7.93E+4 


1.14E+9 


1284 


With GS 


1.21E-4 


1.78E+0 


1.47E+4 


167 



Table 5.9: Basic eigenvalue information for Problem 3. 
Figure [5^ shows the distribution of the eigenvalues for the original (with a zoom) and the scaled 



matrices of Problem 3. 



Figure 5.9 is a histogram of the eigenvalues of the original and the scaled matrices. We can 



see from Figures 5.8 and 5.9 that in the original matrix, the eigenvalues are very concentrated 
around the origin, while in the scaled matrix, many are "pushed" away from the origin. 



6 Problem 4 



This problem is based on a three-dimensional problem from Graham and Hagger [[8l, to which 
we added convection terms. The differential equation is the following: 



— -^{aux) — -^{auy) — —{auz) + dux + euy + fu^ = 0, 



^^{aux) - ^^{auy) - 

where the domain is the unit cube, and a{x,y,z) is defined as 

D if \<x,y, z<l 
1 otherwise. 



The Dirichlet boundary conditions are prescribed with w = 1 on the z = plane and w = on 
the other boundaries. The convection terms were taken as d = e = f = 100, and two values 
of D were tested: D = 10^ (as in the original problem) and D = 10^. Two grids were tested: 
40 X 40 X 40 and 80 x 80 x 80. The resulting linear systems are indefinite, with eigenvalues in 
the four quadrants of the imaginary plane. This problem will also be used to test the limit of 
usefulness of GS as the convection increases. 



Due to the many different cases, the data for this problem is presented differently. Table 6.10 
shows the number of iterations required for convergence, for a grid of 80x 80x 80, with D = 10^ 
and D = 10^. We can see that GS enabled the convergence of Bi-CGSTAB without ILUT, and 
it speeded up Bi-CGSTAB with ILUT quite significantly. GS also enabled the convergence of 
GMRES, with and withou ILUT, for rel-res = 10""^, and also for rel-res = 10"^ with D = 10^. 



Figures 6.10 and 6.11 show bar-plots of the runtimes for the two grid sizes, for 0=10^^ and D = 
10^, for the three levels of accuracy. In cases of stagnation, the figures show (in parentheses) 
the relative residual achieved before stagnation. 

The results can be summarized as follows. 
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Problem 3: Eigenvalue distribution of 
original matrix 



Problem 3: Eigenvalue distribution of 
original matrix (zoom) 
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Problem 3: Eigenvalue distribution after 
geometric scaling (L2) 
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Figure 5.8: Eigenvalue distribution for Problem 3, with a zoom to the region 0-800, and the 
distribution for the scaled matrix. 

• Bi-CGSTAB (without ILUT) and GMRES (with and without ILUT) did not converge in 
any of the cases. 

• GS enabled the convergence of Bi-CGSTAB in all cases, and the convergence of GMRES 
(with and without ILUT) in the low-accuracy cases and in one mid-accuracy case. 

• GS significantly improved the accuracy of GMRES (with and without ILUT) by postpon- 
ing the stage at which stagnation set in, enabling convergence to much higher levels of 
accuracy. 

• For the 80 x 80 x 80-grid, GS speeded up the convergence of Bi-CGSTAB with ILUT quite 
significantly. This was also true for the coarser grid, but only in the low-accuracy case. 

• In the above cases, GS (by itself) was generally a better preconditioner than ILUT for 
Bi-CGSTAB. 

• In all cases with GS, the results for D — 10^ were either similar to or better than those for 
D = 10^. 

• In summary, GS was beneficial in all cases to all the algorithm/ILUT combinations, and in 
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Figure 5.9: Histogram of the eigenvalues for Problem 3, for the original and the scaled cases. 





rel-res 


= 10-4 


rel-res 


= 10 7 


rel-res = 


= 10-10 


Method 


D=104 




D=104 


D= 10^ 


D= 10^ 


D= 10^ 


Bi-CGSTAB 














with GS 


112 


111 


262 


160 


406 


374 


Bi-CGSTAB+ILUT 


77 


52 


112 


139 


123 


169 


with GS 


13 


13 


52 


19 


84 


86 


GMRES 














with GS 


208 


207 




286 






GMRES+ILUT 














with GS 


28 


28 




46 






Note: " — " denotes no convergence to the prescribed accuracy. 



Table 6.10: No. of iterations for Problem 4, with ^ = 10^,^= 10^, and grid 80 x 80 x 80. 




many cases, GS provided a very significant advantage. 



Table 6.1 1 provides the basic eigenvalue information for Problem 4 with D = lO^^, for the orig- 
inal and the scaled matrices, and also for a continuous version with D = 10^ everywhere. The 
grid size for this data was 12x 12x 12. The the last column shows the number of eigenvalues 
whose real part lies in the same interval as the origin (out of 100 intervals). We can see that 
although GS doubled the condition number, it reduced the number of eigenvalues around the 
origin very significantly. 



Figure 6.12 shows the distribution of the eigenvalues for the original Problem 4, and a zoom 
to the region [—300,300]^. We can see that the eigenvalues are very concentrated around the 
origin. Figure 6.13| shows the distribution of the eigenvalues for the scaled and the continu- 
ous (D = 10^) cases. We can see that these distributions look quite similar, with eigenvalues 
concentrated around the perimeter. 



Figure 6.14 provides a histogram of the real part of the eigenvalues of Problem 4, for the origi- 
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Results for D = 10"^ 



Results for D= 10^ 




BiCGS 


1 

no conv. 








Relative Residual < 1.E-07 






with GS. 










1 1 1 




BiCGS+ILUT 






with GS. 






1 1 
1 1 


GIVIRES 


1 

no conv. 


(0.27) 


1 




with GS. 


no conv. 

1 


(1.77E-5) 


1 
1 




GMRE5+ILUT 


1 

no conv. 


( 0.029 ) 


1 




with GS. 


no conv. 

1 




(1.15E-5) 

1 




1 
1 
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Time [sec] 
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Figure 6.10: Runtimes and convergence status for Problem 4, grid = 40 x 40 x 40. 



Matrix 








No. of eigenvalues 
around x = 


Original 


1.45E-2 


2.93E+3 


2.02E+5 


1131 


With GS 


2.41E-6 


l.OOE+0 


4.16E+5 


25 


Cont. coef. (D=\0^) 


3.61E+0 


6.18E+4 


1.71E+4 


42 



Table 6.1 1: Basic eigenvalue information for Problem 4. 



16 



Results for D = 10"^ 



Results for D= 10^ 






1 




Relative Residual < 1.E-07 


BiCGS 


no conv. 




1 

^^^^^ 
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1 

(0.16) 


1 
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no conv. 


( 9.49E-6 ) 






GMRES-t-ILUT 


1 

no conv. 


1 

(1.88E-3) 


1 
1 
1 




with GS 


no conv. 




( 6.46E-6 ) 
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Figure 6.11: Runtimes and convergence status for Problem 4, grid = 80 x 80 x 80. 



nal, the scaled, and the continuous {D = 10"^) cases. The histogram provides an additional view 
of how the eigenvalues are distributed in the three cases. 

GS is proposed as being useful for discontinuous coefficients when the convection terms are 
small to moderate. In order to demonstrate this, we considered Problem 4 with D = 10^, a 
grid size of 40x40x40, and varying convection terms. The previous results were based on 
convection terms of 100, so we also tested this case with convection terms of 200, 500, and 



1000. The results, which are summarized in Table 6.12 below, show how the usefulness of GS 
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Problem 4: Eigenvalue distribution of 
original matrix 



^ -1000 




-1000 1000 

Real component 



Problem 4: Eigenvalue distribution of 
original matrix (zoom) 




-100 100 
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Figure 6.12: Eigenvalue distribution for Problem 4, with a zoom to the region [—300, 300]^. 



Problem 4: Eigenvalue distribution after Problem 4: Eigenvalue distribution of 

geometric scaling (L 2) continuous case (D=1 .E+04) 




-1 -0.75 -0.5 -0.25 0.25 0.5 0.75 1 -62000 -31000 31000 62000 

Real component Real component 



Figure 6.13: Eigenvalue distribution for Problem 4, for the scaled and the continuous cases, 
degrades as the convection terms are increased. 

Note in particular the difference between Bi-CGSTAB without ILUT and Bi-CGSTAB with 
ILUT: it turns out that without ILUT, GS enables convergence to 10^^ even with convection 
= 500, but with ILUT, degradation already appears with convection = 200; this degradation is 
independent of the use of GS. 
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Problem 4 




Figure 6.14: Histogram of the real part of the eigenvalues for Problem 4, for the original, the 
scaled, and the continuous cases. 



Method / Convection: 


100 


200 


500 


1000 


Bi-CGSTAB 
with GS 


10-10 


10-10 


10-10 




Bi-CGSTAB+ILUT 
with GS 


10-10 
10-10 


10-4 
10-4 






GMRES 
with GS 


10-4 


10-4 






GMRES+ILUT 
with GS 


10-4 








Note: ' — ' means no convergence. The numbers 
indicate which relative error goal was obtained. 



Table 6.12: Degradation of the various methods as the convection terms are increased. 



7 Conclusions and further research 



In this paper we have reported on a very simple technique for improving the convergence prop- 
erties of algorithms on certain linear systems with discontinuous coefficients. The technique, 
called geometric scaling (GS), consists of simply dividing each equation by the Lp-norm of its 
vector of coefficients. GS, with p = 2, was tested on four nonsymmetric problems derived from 
convection-diffusion elliptic PDEs, with small to moderate convection terms. Bi-CGSTAB and 
restarted GMRES, with and without ILUT, were tested on the four problems with and without 
GS. Normally, such problems are solved by domain decomposition (DD) techniques, but these 
can be difficult to implement if the boundaries between the subdomains have a complicated 
geometry or the grid is unstructured. The problems were taken from (or based on) well-known 
examples from the literature, and included two- and three-dimensional cases, with Dirichlet 
and mixed Dirichlet/Neumann boundary conditions. One problem had a complicated geometry. 
Three different convergence goals were prescribed: relative residual < 10-4, 10-^, lO-io. 
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Table 7.13 summarizes the convergence behavior of the four different algorithm/preconditioner 
combinations, with and without GS, on the four different problems, for the three convergence 
goals. The results indicate that GS was very useful in improving the convergence status of 
the different solution methods on the tested problems. With one exception, when the solution 
method converged on a problem with discontinuous coefficients, GS speeded up the conver- 
gence. Eigenvalue distribution maps show that when there is a strong concentration of values 
around the origin, GS reduces this concentration very significantly and "pushes" the values 
away from the origin. 



Method 


Problem 1 


Problem 2 


Problem 3 


Problem 4 


Bi-CGSTAB 
with GS 


+ + + 


7.2x10-10 




+ + + 


Bi-CGSTAB+ILUT 
with GS 


+ + + 


+ + * 


+ + + 
* * * 


+ + + 
* * * 


GMRES 
with GS 


3.8x10-2 
1.1x10-5 


1.37 

2.2x10-5 




0.27 
1.8x10-5 


GMRES+ILUT 
with GS 


3.9x10-3 
1.1x10-5 


0.90 

1.9x10-5 


+ + + 


0.29 

1.25x10-5 


Note: '— ' means no convergence, '+' means convergence, means 
better convergence. The numbers indicate the best relative error obtained. 



Table 7.13: Summary of convergence of the different methods on the four problems, for the 
three convergence goals (10-^, 10-^, and 10-^*^). 

This study is only a first report on this subject, and much work remains to be done. On the 
theoretical side, we have seen that GS improves the eigenvalue distribution, but further analysis 
is still needed. On the practical side, GS needs to be tested in conjunction with other algorithms 
and preconditioners, on various other difficult problems. 

A natural question that arises is how does GS compare with DD techniques? Over the years, 
these methods have achieved a high level of efficiency, and a head-on runtime comparison 
is very much problem-dependent and a topic for further research. However, an even more 
interesting topic is the combination of the two methods: apply GS to the equations and then 
apply some standard DD method; hopefully, on some problems, GS could also improve the 
convergence properties of some DD approaches. 

GS can also help the parallelization of solution methods in several ways. Firstly, GS itself is a 
parallel computational step. Secondly, it enables the partitioning of a domain into subdomains 
along boundaries that do not necessarily follow the physical boundaries of the problem; this 
way, better load balancing can be achieved. Thirdly, as seen in some of the cases, GS enabled 
the convergence of algorithms that are inherently parallel without the need for a preconditioner 
(such as ILUT) that may not be ideal for parallelism. 

Another topic for further research is the application of the (sequential) CGMN algorithm [HI HI 
and its block-parallel version CARP-CG [El on problems that have discontinuous coefficients 
and are also strongly convection-dominated. These algorithms have already been shown to be 
very effective on convection-dominated problems and initial experiments with discontinuous 
coefficients have yielded good results. The reason for this may be that these two methods are 
essentially conjugate-gradient acceleration of the Kaczmarz algorithm, and as such, GS (with 
the L2-norm) is already inherent in them. 
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